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Abstract 

We propose an aJgorithm for evaluation of the cumulative bivariate 
normal distribution, building upon Marsaglia's ideas for evaluation of the 
cumulative univariate normal distribution. The algorithm is mathemat- 
ically transparent, delivers competitive performance and can easily be 
extended to arbitrary precision. 

1 Introduction 

The cumulative normal distribution, be it univariate or multivariate, has to be 

evaluated numerically. There arc numerous algorithms available, many of these 
having been fine-tuned, leading to faster evaluation and higher accuracy but 
also to lack of mathematical transparency. 

For the univariate case, Marsaglia (2004) has proposed a very simple and 
intuitive but powerful alternative that is based on Taylor expansion of Mills' 
ratio or similar functions. In this note we will extend Marsaglia's approach 
to the bivariate case. This will require two steps: reduction of the evaluation 
of the cumulative bivariate normal distribution to evaluation(s) of a univariate 
function, i.e., to the cumulative bivariate normal distribution on the diagonal, 
and Taylor expansion of that function. Note that a similar approach, but with 
reduction to the axes instead of the diagonals, has been proposed by Vogt (2008). 

The resulting algorithm has to be compared with existing approaches. For 
overview on and discussion of the latter, cf. (Brctz & Genz 2009), (Agea & 
Chance 2003), (Terza & Welland 1991), and (Wang & Kennedy 1990). Most 
implementations today will rely on variants of the approaches of Divgi (1979) 
or of Drezner & Wesolowsky (1990). Improvements of the latter method have 
been provided by Genz (2004) and West (2005). The method of Drezner (1978), 
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although less reliable, is also very common, mainly because it is featured in 
(Hull 2008) and other prevalent books. 

It will turn out that the algorithm proposed in this paper is able to de- 
liver near double precision (in terms of absolute error) using double arithmetic. 
Furthermore, implementation of the algorithm using high-precision libraries is 
straightforward; indeed, a quad-double implementation has been applied for 
testing purposes. Performance is competitive, and trade-offs between speed and 
accuracy may be implemented with little effort. 



2 Theory 

In this section we are going to develop the algorithm. In order to keep the 
presentation lean we will often refer to the author's recent survey (Meyer 2009). 
For further background on normal distributions the reader is also referred to text 
books such as (Balakrishnan & Lai 2009), (Kotz, Balakrishnan & Johnson 2000) 
and (Patel & Read 1996). 



2.1 Evaluation on the diagonal 

Denote by 

the density and distribution function of the standard normal distribution. Mills' 
ratio is then defined as 

ip{x) yy{-x) 

Furthermore, denote by 

1 ( x^ - 2gxy + y"' 

^^^"'^'^^ ^= WT^"' r 2(1-,^) 

*2(a;,j/;£>) := / / tp2{s,t; g) dt ds, 

J — CO J — CO 



the density and distribution function of the bivariate standard normal distribu- 
tion with correlation parameter g G (—1, 1). We will also write 

f2{x; g) := (p2{x, x; g) = ^ exp 



277^/1 - V 1 + e 

$2(3;; g) := ^2(x,x-, g), 



\{g) :- 



1 + g' 
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We are going to use the following properties: 

^(P2{x; q) = --^^ • ^2{x] q), 
ax I + g 

ip2{x; g) ■ \/l - £)2 = ip(x) ■ v>{X{q) ■ x) 

In the following we will assume that a; < 0, ^ > 0. In this case the following 
bounds apply (cf. (Meyer 2009, Th. 5.2)): 

1 + - arcsin(e) < ^ + S (2-1) 

Furthermore, as is proven implicitly in (Meyer 2009, App. A. 2), 

X — !■- cx) $(x) • $(A(£i) • a;) 

Now we define 

{l + g)-^x)-nX{g)-x)-Mx;g) _ 

f2{x;g) 

Starting with 

D'{x) = {g-l)- • R{-X{g) ■x) + {l- e^) • R{-x) + ■ D{x). 

1 + g 

we find the recursion 

iP«(a:) = {q-1)- • {-X{Q)f-^ ■ R^'-'X-Xig) ■ x) 

+ (l-f?2).(_i)fc-i.i?(fe-i)(_a;) 

+ tlil . D(''-'\x) + ■ D('^-'\x) 
1+g ^ ' 1+g ^ ' 

which we can use to r ecursively evaluate the Taylor expansion of D around zero. 
Dividing by \J\ — ^ for convenience, we define 

afc ■.= ''^-{g-\)-{-X{e)f-R^^\G), 

fc+1 

:=^-yr^-(-i)^-ii('=no), 



dfe := 



Using 



i?('=)(x) = (A; - 1) • i?('=-2)(x) + X ■ R^^-^\x) 
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we derive the following recursion scheme: 

1 2 ^-Q 



ak = r ■ X ■— — ■ ak-2, (2.2) 

6fe = ^ • • bk-2, (2.3) 

1 / 2.^2 \ 

4 = ^ • ( afc-i + h-i + ■ dk~2] , (2.4) 



with initial values 

ao = {g-l)- ^-x, (2.5) 
ai = A(^^)•(e-l)•a;^ (2.6) 

bo = \/r^ • • (2-7) 
bi = ^/l^-x^ (2.8) 
do = — arcsin(£i), (2.9) 

di = + .yi-a;. (2.10) 

Here we have used that 

R{0) = $2(0; e) = ^ + ^ • arcsin(e). 

We can now compute $2(2;; q) numerically via 

$2(x; g) = {l + e)- ^x) ■ $(A(^?) . a;) - ^ • exp (-^) • (^E^*^) • 

Note that it would also have been possible to work with, e.g., one of the functions 

l-^2{x;g) 



D2{x) := 

D^{x) :-- 
Di{x) :■- 



V2{x;q) 
$2(0; g)- $2(3;; g) 
'P2{x;g) 

2 ■ ^x) ■ ^Xjg) ■ x) - <^2{x; g) 

(P2{x; g) 



instead. The resulting recursion schemes are in fact easier (two summands 
instead of three) but will be running into numerical problems (cancellation, or 
lower accuracy for x — > —00). 
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2.2 Reduction to the diagonal 

In order to apply the results from Section |2.1| to the numerical evaluation of 
^2(2^, for general x, y and g, we start with the symmetric formula (cf. 
(Meyer 2009, Eq. (3.16))) 



^2{x,y;g) = $2(2;, 0; &) - 4 + $2(0, y; g^) - (5.y, 



(2.11) 



where 



^ _ J I' 2; < and y > 0, 
^ I 0, else, 



^, y < and x >0, 
0, else, 



and 



Qy = 



2 ' 



l + al 



From the axis ?y = to the diagonal x — y we get by applying the formula (cf. 
(Meyer 2009, Eq. (3.18))) 



$2(3;, 0;£-) 
Specifically, we obtain 



\-^2{x,x]1-2q'^), 



£» < 0, 



$(x)- i •$2(a;,2;;l-2(?2), g>Q. 



2-{bx- y f 
^2 _|_ y2 _ 2Qxy 



2 ■ CLx 

1 + a„ 



with 



QX-y 



X 



x-y 

^x ■ v/l - 
x + y 



v/r^ V 1-e 



1-g 
1 + 



(2.12) 
(2.13) 

(2.14) 
(2.15) 
(2.16) 



where in an implementation (2.15) should be used for g — > 1, and (|2.16|) for 
to avoid catastrophic cai 

Qx <0 ■<=^ > - > g- 



g — > —1, in order to avoid catastrophic cancellation. Note also that 

y 
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In a last step, if necessary to ensure x < and g > 0, we apply the formulas 
(of. (Meyer 2009, Eq. (2.15)) and (Meyer 2009, Eq. (3.27))) 



$2(2;, x;g) ^2- <^{x) - 1 + $2(-a;, -x; g), 

^2{x, x;g) = 2- $(.x) • $ {\{g) • x) - $2 (Hq) ■ x, X{g) ■ x, -g) 

Specifically, we obtain 

\\{l-2gl)-x\^\x\- 



el 



el 



\gx - 


y\ 




e' 


X — 


V 




-e" 


x + 


V 




-g^ 



where in an implementation (2.20) should be used for g 
g — > —1, in order to avoid catastrophic cancellation. 
It will be favorable to work with 



2 ■ ax 

1 + a. 



/I 


- e 




+ e 


/I 





\-g 



(2.17) 
(2.18) 



(2.19) 
(2.20) 
(2.21) 



1, and (2.21) for 



instead of 1 — 2(?^. If (2.18) has to be applied (i.e., if 1 — 2g^ < 0, which is 
equivalent with > 1), correspondingly we will work with 



l^{-{l~2gl)) 



2gl 



(2.22) 



3 Implementation 

In the following we will discuss implementation of the algorithm derived in 
Section[2j The C++ language has been chosen because it is the market standard 
in quantitative finance, one of the fields frequently requiring evaluation of normal 
distributions. 



3.1 Evaluation on the diagonal 



Source code (in C++) for evaluation of $2(2;; ^?) as in Section 2.1 for x < 



and p > 0, is provided in Figure [T] In the following we will comment on some 
details of the implementation. 



(2.10) show that it is reasonable to provide a := 1 — g, 
as input for the evaluation of $2(2^; e)- Moreover, cf. (2.13) and 
double inversion (i.e., computation of 1 — (1 



Equations (2.2) 
instead of g 



z) instead of z) is to be 



avoided in the reduction algorithm. 
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Values for and for $(A(g) • x) are also expected as input parameters. 

This makes sense because the values are needed by the reduction algorithm as 
well (and hence should not be computed twice). 

Evaluation of arcsin(£i) is to be avoided for g — > 1 and has been replaced 
(without optimization of the cutoff point) by 



arcsin(^)) = arccos ( ^1- \ ~ arccos(A(^))). 



Note that X{g) has to be computed anyway. 

Constants (all involving tt) have been pre-computed in double precision. 
The recursion stops if a new term does not change the computed sum. If the 



a priori bound for the absolute error, given by ( |2.1[ ), is less than 5 • 10~^^, the 
upper bound is returned (relative accuracy on the diagonal may be increased 
by dropping this condition but overall relative accuracy will still be determined 



by the reduction to the diagonal, cf. Section 3.2), and by the accuracy of the 
implementation of $. The final result is always checked against the upper and 
lower bound. 

Note that d2k and d2k+i have different sign but comparable order. Bracket- 
ing them before summation can therefore reduce cancellation error. 

3.2 Reduction to the diagonal 



Source code (in C++) for evaluation of $2(3^; 9) as in Equation (2.11) is pro- 
vided in Figure |3j and source code for evaluation of $2(2^1 &) ~ is provided 
in Figure [2j In the following we will comment on some details of the implemen- 
tation. 

The special cases |e| = 1 and x — y — are dealt with in Phi2() . Therefore, 
in Phi2help() there is no check against 1 . - rho == 0.0,1.0 + rho == 0.0 
or s == 0.0. 

It is assumed that sqr(x) evaluates x*x. The cutoff points \g\ — 0.99 have 
been set by visual inspection and might be optimized. 



4 Discussion 

Evaluation of <1'2(2;, y; p) as in Section [S] will require (at most) four calls to an 
implementation of the cumulative standard normal distribution $ (Phi ( ) in 
the code). The actual choice may well determine both accuracy and running 
time of the algorithm. For testing purposes I have been using a hybrid method, 
calling the algorithm from (West 2005, Fig. 2) for absolute value larger than 
0.5, and PhiO from (Marsaglia 2004) else. Besides PhiO, expO will be called 
two times, arcsinO or arccos () two times, and sqrtO six times. Everything 
else is elementary arithmetic. 

Due to the reduction algorithm, the final result will be a sum. Therefore, 
very high accuracy in terms of relative error can not be expected. Consequently, 
evaluation of the diagonal aims at absolute error as well. 
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double Phi2d.iag( const doublefe x, 

const doublefe a, //I - rho 

const doubleSs px, // Phi( x ) 

const doubleSt pxs ) // Phi( lambda ( rho ) * x 



if ( a <= 0.0 ) return px ; // rho == 1 

if ( a >= 1.0 ) return px * px; // rho == 

double b = 2.0 - a, sqrt_ab = sqrt( a * b ); 

double asr = ( a > 0.1 ? asin( 1.0 - a ) : acos( sqrt_ab ) ); 
double comp = px * pxs ; 

if ( comp * ( 1.0 - a - 6.36619772367581343e-001 * asr ) < 5e-17 ) 
return b * comp; 



double tmp = 1.25331413731550025 * x; 
double a_coeff =a*x*x/b; 
double a_even = -tmp * a; 
double a_odd = -sqrt_ab * a_coef f ; 
double b_coeff = x * x; 
double b_even = tmp * sqrt_ab; 
double b_odd = sqrt_ab * b.coef f ; 
double d_coeff =2.0*x*x/b; 

double d.even = ( 1.0 - a ) * 1.57079632679489662 - asr; 
double d_odd = tmp * ( sqrt_ab - a ) ; 

double res = 0.0, res_new = d_even + d_odd; 
int k = 2; 

while ( res != res_new ) 
{ 

d_even = ( a_odd + b_odd + d_coeff * d_even ) / k; 
a_even *= a_coeff / k; 
b_even *= b_coeff / k; 
k++; 

a_odd *= a_coeff / k; 
b_odd *= b_coeff / k; 

d_odd = ( a_even + b_even + d_coeff * d_odd ) / k; 
k++; 

res = res_new; 

res_new += d_even + d_odd; 

> 

res *= exp( -X * X / b ) * 1.591549430918953358e-001; 
return max( ( 1.0 + 6 . 36619772367581343e-001 * asr ) * comp, 
b * comp - max( 0.0, res ) ); 

} 



Figure 1: C++ source code for evaluation of ^2{x; g) 
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double Phi2help( const doublefe x, 
const doubleSt y, 
const doubleSt rho ) 

{ 

if( X == 0.0 ) return (y>=0.0?0.0:0.5); 

double s = sqrt ((1.0- rho ) * ( 1.0+ rho ) ) ; 

double a = 0.0, bl = -fabs( x ), b2 = 0.0; 

if( rho > 0.99 ) 

i 

double tmp = sqrt ((1.0- rho ) / ( 1.0+ rho ) ) ; 

b2 = -fabs( (x-y) /s-x* tmp ); 
a = sqr ((x-y)/x/s- tmp ) ; 

> 

else if( rho < -0.99 ) 
{ 

double tmp = sqrt ( ( 1.0+ rho ) / ( 1.0- rho ) ) ; 
b2 = -fabs( (x+y) /s-x* tmp ); 
a = sqr ((x+y)/x/s- tmp ) ; 

> 

else 
{ 

b2 = -fabs( rho * x - y ) / s; 
a = sqr ( b2 / x ) ; 

> 

double pi = Phi( bl ), p2 = Phi( b2 ) ; // cum. standard normal 

double q = 0.0; 

if( a <= 1.0 ) 

q = 0.5 * Phi2diag( bl, 2.0*a/ ( 1.0+a), pi, p2); 

else 

q=pl*p2-0.5* Phi2diag( b2, 2.0 / ( 1.0+a ), p2, pi ); 

int cl = ( y / X >= rho ) ; 
int c2=(x<0.0); 
int c3 = c2 M ( y >= . ) ; 
return ( cl StSt c3 ? q - 0.5 

: cl && c2 ? q 

: cl ? 0.5 - pi + q 

: c3 ? pi - q - 0.5 

: c2 ? pi - q 

: 0.5 - q ); 



Figure 2: C++ source code for evaluation of $2(2;, 0; Qx) — 4 



9 



double Phi2( const doublefc x, 
const doublefe y, 
const doublefe rho ) 

{ 

if( ( 1.0 - rho ) * ( 1.0 + rho ) <= . ) // I rho I ==1 
if( rho > 0.0 ) 

return Phi( min( x, y ) ); 

else 

return max( 0.0, min( 1.0, Phi( x ) + Phi( y ) - 1.0 ) ); 

if( X == 0.0 M y == 0.0 ) 
if( rho > 0.0 ) 

return Phi2diag( 0.0, 1.0 - rho, 0.5, 0.5 ); 

else 

return 0.5 - Phi2diag( 0.0, 1.0 + rho, 0.5, 0.5 ); 

return max ( 0.0, 
min( 1.0, 

Phi2help( x, y, rho ) + Phi2help( y, x, rho ) ) ); 



Figure 3: C++ source code for evaluation of $2(2^, y; q) 



The Phi2diag() function is behaving as it may be expected from an ap- 
proximation by a Taylor series around zero: (absolute) error increases with 
decreasing x. For x < —7 (or g — > or g — > 1) the error bounds from (2.1 1 
are taking over, and absolute error decreases again. The maximum absolute 
error is obtained for x « —7, g ~ 0.8 (maximum error of the upper bound is 
obtained for g = ^1 - A/n^ « 0.7712, cf. (Meyer 2009, Th. 5.2)). 

In general, assuming that all numerical fallacies in the reduction algorithm 
have been taken care of, the diagonal is expected to provide a worst case because 
the errors of the two calls to Phi2diag() will not cancel. With respect to the 
reduction algorithm, the case gx ~ y, gy ~ x, implying \g\ w 1, is most critical. 

In order to give an impression of the algorithm's behaviour, we will discuss 
the results of a simulation study. For each n g {0, . . . , 200}, me {1, . . . , 10®}, 
the value of $2(a;mi2/m; Bm) has been computed via the Phi2() function from 
Figure [3] where xj^ has been drawn from a uniform distribution on [x"— 0.05, a;"+ 
0.05] with x" := n/10 — 10, yj^ has been drawn from a uniform distribution on 
[—10, 10], and g^ := 2$(rJ^) — 0.5 where rj^ has been drawn from a uniform 
distribution on [—10, 10] as well. 

The C++ implementation from (West 2005) has been serving as a com- 
petitor. Both functions have been evaluated against a quad-double precision 
version of Phi2(), implemented using the QD library (Bailey, Hida, Li & 
Thompson 2010) and quad-double precision constants. 

The diagram in Figure [i] is displaying, for n € {0, . . . , 200}, the 99% quan- 
tile and the maximum of the absolute difference between the double precision 
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Figure 4: Absolute error of implementations of $2 in a simulation study 



algorithms (Phi2 and West) and the quad-double precision algorithm. 

Apart from a shift due to subtractions for positive x, errors of Plii2 are 
rather symmetric around zero. The peaks at |a;"| ~ 7 are due to the Taylor 
expansion around zero; the peaks at |a;"| « 2 are due to Taylor expansion after 
transformation of the argument. The characteristics of the 99% quantile, in 
particular the little peaks at w 0.7, are already visible in the error of the 
$ function used. The maximum error of West almost always stays below the 
one of Ph.i2. Note that the maximum error of West is determined by the case 
g — > —1 and might be reduced by careful consideration of that case. 

In the simulation study, Phi2 was a little slower than West: it took approx- 
imately five minutes and four minutes to perform the 201 • 10^ evaluations on 
a fairly standard office PC (and it took two days to perform the correspond- 
ing quad-double precision evaluations). The number of recursion steps used by 
Phi2d.iag is increasing with |a;|. Because of the mathematical transparency of 
the algorithm it should be easy to find an appropriate trade-off between speed 
and accuracy by replacing the condition terminating the recursion. 
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